6 research outputs found
Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Open-vocabulary object detection (OVD) aims to scale up vocabulary size to
detect objects of novel categories beyond the training vocabulary. Recent work
resorts to the rich knowledge in pre-trained vision-language models. However,
existing methods are ineffective in proposal-level vision-language alignment.
Meanwhile, the models usually suffer from confidence bias toward base
categories and perform worse on novel ones. To overcome the challenges, we
present MEDet, a novel and effective OVD framework with proposal mining and
prediction equalization. First, we design an online proposal mining to refine
the inherited vision-semantic knowledge from coarse to fine, allowing for
proposal-level detection-oriented feature alignment. Second, based on causal
inference theory, we introduce a class-wise backdoor adjustment to reinforce
the predictions on novel categories to improve the overall OVD performance.
Extensive experiments on COCO and LVIS benchmarks verify the superiority of
MEDet over the competing approaches in detecting objects of novel categories,
e.g., 32.6% AP50 on COCO and 22.4% mask mAP on LVIS
Revisiting Image Aesthetic Assessment via Self-Supervised Feature Learning
Visual aesthetic assessment has been an active research field for decades.
Although latest methods have achieved promising performance on benchmark
datasets, they typically rely on a large number of manual annotations including
both aesthetic labels and related image attributes. In this paper, we revisit
the problem of image aesthetic assessment from the self-supervised feature
learning perspective. Our motivation is that a suitable feature representation
for image aesthetic assessment should be able to distinguish different
expert-designed image manipulations, which have close relationships with
negative aesthetic effects. To this end, we design two novel pretext tasks to
identify the types and parameters of editing operations applied to synthetic
instances. The features from our pretext tasks are then adapted for a one-layer
linear classifier to evaluate the performance in terms of binary aesthetic
classification. We conduct extensive quantitative experiments on three
benchmark datasets and demonstrate that our approach can faithfully extract
aesthetics-aware features and outperform alternative pretext schemes. Moreover,
we achieve comparable results to state-of-the-art supervised methods that use
10 million labels from ImageNet.Comment: AAAI Conference on Artificial Intelligence, 2020, accepte
Centroid-aware local discriminative metric learning in speaker verification.
International audienceWe propose a new mechanism to pave the way for efficient learning against class-imbalance and improve representation of identity vector (i-vector) in automatic speaker verification (ASV). The insight is to effectively exploit the inherent structure within ASV corpus — centroid priori. In particular: (1) to ensure learning efficiency against class-imbalance, the centroid-aware balanced boosting sampling is proposed to collect balanced mini-batch; (2) to strengthen local discriminative modeling on the mini-batches, neighborhood component analysis (NCA) and magnet loss (MNL) are adopted in ASV-specific modifications. The integration creates adaptive NCA (AdaNCA) and linear MNL (LMNL). Numerical results show that LMNL is a competitive candidate for low-dimensional projection on i-vector (EER = 3.84% on SRE2008, EER = 1.81% on SRE2010), enjoying competitive edge over linear discriminant analysis (LDA). AdaNCA (EER = 4.03% on SRE2008, EER = 2.05% on SRE2010) also performs well. Furthermore, to facilitate the future study on boosting sampling, connections between boosting sampling, hinge loss and data augmentation have been established, which help understand the behavior of boosting sampling further
Generalizable Representation Learning for Mixture Domain Face Anti-Spoofing
Face anti-spoofing approach based on domain generalization (DG) has drawn growing attention due to its robustness for unseen scenarios. Existing DG methods assume that the domain label is known. However, in real-world applications, the collected dataset always contains mixture domains, where the domain label is unknown. In this case, most of existing methods may not work. Further, even if we can obtain the domain label as existing methods, we think this is just a sub-optimal partition. To overcome the limitation, we propose domain dynamic adjustment meta-learning (DAM) without using domain labels, which iteratively divides mixture domains via discriminative domain representation and trains a generalizable face anti-spoofing with meta-learning. Specifically, we design a domain feature based on Instance Normalization (IN) and propose a domain representation learning module (DRLM) to extract discriminative domain features for clustering. Moreover, to reduce the side effect of outliers on clustering performance, we additionally utilize maximum mean discrepancy (MMD) to align the distribution of sample features to a prior distribution, which improves the reliability of clustering. Extensive experiments show that the proposed method outperforms conventional DG-based face anti-spoofing methods, including those utilizing domain labels. Furthermore, we enhance the interpretability through visualization